前言
The On-Policy Algorithms
on-policy
: they don’t use old data, which makes them weaker on sample efficiency.
good reason, they directly optimize the policy performance.
VPG(Vanilla Policy Gradient)——basic, entry-level
TRPO(Trust Region Policy Optimization), PPO(Proximal Policy Optimization)
off-policy
: reuse old data very efficiently
DDPG(Deep Deterministic Policy Gradient)——foundational algorithm
Q-learning
TD3(Twin Delayed DDPG)
SAC(Soft Actor-Critic)
Bellman Equation(贝尔曼方程) :是动态规划等最优化方法能达到最优的必要条件。
Code Format
两种:
一种是核心代码,包含程序的运行逻辑。(algorithm file)
直接在Gym环境下运行代码效果会更好
一种是实现该程序所需的辅助代码。(core file)
强化学习环境的配置
注意 !!! 由于自己ubuntu默认python版本为2.7,所以安装下面东西时需要将其版本切换成python3,具体操作如下
1 | echo alias python=python3 >> ~/.bashrc |
参考链接
- https://blog.csdn.net/will_ye/article/details/81071179
- https://blog.csdn.net/u012424106/article/details/79521173
安装anaconda,自带python版本为3.7 (cuda还没安装)
利用anaconda新建一个名为spinningup的环境,
source activate spinningup
激活环境source deactivate spinningup
关闭当前环境安装OpenMPI
安装Spinningup
pip install Spinning Up
安装gym,步骤mujoco–>mujoco-py–>gym
安装mujoco时,获取
mjkey.txt
注意,先运行chmod +x ./getid_linux
,然后运行./getid_linux
安装mujoco-py,注意anaconda自带python版本,别弄错。利用conda新建一个mujoco-py的环境,利用
conda activate mujoco-py
激活,conda deactivate
关闭安装gym,照常安装就行
安装tensorflow,注意跟环境内的python3.7版本配套,利用
pip install --ignore-installed --upgrade
安装安装
baseline
TORCS环境的安装
TORCS环境直接安装在Anaconda创建的mujoco-py中,利用里面的python3和OpenAI-gym
参考链接
- https://blog.csdn.net/wgbarry/article/details/82827981
- https://blog.csdn.net/coolsunxu/article/details/83962601
- https://blog.csdn.net/ss910/article/details/77618425
启动步骤
- sudo torcs
- python snakeoil3_gym.py
操作说明
- F2切换视角,画面中右下方十字表示右转左转油门刹车
Anaconda下Opencv安装
由于安装了ROS,import cv2时会报错,网上有解决方法
ImportError: /opt/ros/kinetic/lib/python2.7/dist-packages/cv2.so: undefined symbol: PyCObject_Type
评论加载中